Download A simplified approach to high quality music and sound over IP
Present systems for streaming digital audio between devices connected by internet have been limited by a number of compromises. Because of restricted bandwidth and “best effort” delivery, signal compression of one form or another is typical. Buffering of audio data which is needed to safeguard against delivery uncertainties can cause signal delays of seconds. Audio is in general an unforgiving test of networking, e.g., one data packet arriving too late and we hear it. Trade-offs of signal quality have been necessary to avoid this basic fact and until now, have vied against serious musical uses. Beginning in late 1998, audio applications specifically designed for next-generation networks were initiated that could meet the stringent requirements of professional-quality music streaming. A related experiment was begun to explore the use of audio as a network measurement tool. SoundWIRE (sound waves over the internet from real-time echoes) creates a sonar-like ping to display to the ear qualities of bidirectional connections. Recent experiments have achieved coast-to-coast sustained audio connections whose round trip times are within a factor of 2 of the speed of light. Full-duplex speech over these connections feels comfortable and in an IIR recirculating form that creates echoes like SoundWIRE, users can experience singing into a transcontinental echo chamber. Three simplifications to audio streaming are suggested in this paper: Compression has been eliminated to reduce delay and enhance signal-quality. TCP/IP is used in unidirectional flows for its delivery guarantees and thereby eliminating the need for application software to correct transmission errors. QoS puts bounds on latency and jitter affecting long-haul bidirectional flows.
Download Real-time Pitch Tracking in Audio Signals with the Extended Complex Kalman Filter
The Kalman filter is a well-known tool used extensively in robotics, navigation, speech enhancement and finance. In this paper, we propose a novel pitch follower based on the Extended Complex Kalman Filter (ECKF). An advantage of this pitch follower is that it operates on a sample-by-sample basis, unlike other block-based algorithms that are most commonly used in pitch estimation. Thus, it estimates sample-synchronous fundamental frequency (assumed to be the perceived pitch), which makes it ideal for real-time implementation. Simultaneously, the ECKF also tracks the amplitude envelope of the input audio signal. Finally, we test our ECKF pitch detector on a number of cello and double bass recordings played with various ornaments, such as vibrato, portamento and trill, and compare its result with the well-known YIN estimator, to conclude the effectiveness of our algorithm.
Download A Generative Model for Raw Audio Using Transformer Architectures
This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet . This is fully probabilistic, auto-regressive, and causal, i.e. each sample generated depends on only the previously observed samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step. Using the attention mechanism, we enable the architecture to learn which audio samples are important for the prediction of the future sample. We show how causal transformer generative models can be used for raw waveform synthesis. We also show that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current model to synthesize audio from latent representations suggests a large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis is, however, still far away from generating any meaningful music similar to wavenet, without using latent codes/meta-data to aid the generation process.